Overview

Dataset statistics

Number of variables11
Number of observations329237
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory27.6 MiB
Average record size in memory88.0 B

Variable types

NUM11

Reproduction

Analysis started2020-08-25 23:38:41.004779
Analysis finished2020-08-25 23:40:31.973569
Duration1 minute and 50.97 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Variables

year
Real number (ℝ≥0)

Distinct count29
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2002.0
Minimum1988
Maximum2016
Zeros0
Zeros (%)0.0%
Memory size2.5 MiB

Quantile statistics

Minimum1988
5-th percentile1989
Q11995
median2002
Q32009
95-th percentile2015
Maximum2016
Range28
Interquartile range (IQR)14

Descriptive statistics

Standard deviation8.366612971
Coefficient of variation (CV)0.004179127358
Kurtosis-1.202857186
Mean2002
Median Absolute Deviation (MAD)7
Skewness0
Sum659132474
Variance70.00021261
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2016113533.4%
 
2001113533.4%
 
1989113533.4%
 
1990113533.4%
 
1991113533.4%
 
1992113533.4%
 
1993113533.4%
 
1994113533.4%
 
1995113533.4%
 
1996113533.4%
 
Other values (19)21570765.5%
 
ValueCountFrequency (%) 
1988113533.4%
 
1989113533.4%
 
1990113533.4%
 
1991113533.4%
 
1992113533.4%
 
ValueCountFrequency (%) 
2016113533.4%
 
2015113533.4%
 
2014113533.4%
 
2013113533.4%
 
2012113533.4%
 

zipcode
Real number (ℝ≥0)

Distinct count11353
Unique (%)3.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean45295.89747203382
Minimum1001
Maximum99901
Zeros0
Zeros (%)0.0%
Memory size2.5 MiB

Quantile statistics

Minimum1001
5-th percentile6001
Q119425
median39475
Q372370
95-th percentile95811
Maximum99901
Range98900
Interquartile range (IQR)52945

Descriptive statistics

Standard deviation29526.88225
Coefficient of variation (CV)0.6518665905
Kurtosis-1.154559594
Mean45295.89747
Median Absolute Deviation (MAD)24662
Skewness0.3152461507
Sum1.49130854e+10
Variance871836775.4
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2865829< 0.1%
 
7077529< 0.1%
 
3283729< 0.1%
 
4512329< 0.1%
 
4717029< 0.1%
 
9202429< 0.1%
 
3260329< 0.1%
 
3055429< 0.1%
 
2645629< 0.1%
 
187629< 0.1%
 
Other values (11343)32894799.9%
 
ValueCountFrequency (%) 
100129< 0.1%
 
100229< 0.1%
 
100529< 0.1%
 
100729< 0.1%
 
101029< 0.1%
 
ValueCountFrequency (%) 
9990129< 0.1%
 
9980129< 0.1%
 
9970929< 0.1%
 
9970129< 0.1%
 
9957729< 0.1%
 

EQI_zip
Real number (ℝ≥0)

Distinct count322573
Unique (%)98.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.0005261158443904059
Minimum1.1998789e-05
Maximum0.0641894
Zeros0
Zeros (%)0.0%
Memory size2.5 MiB

Quantile statistics

Minimum1.1998789e-05
5-th percentile0.000116825016
Q10.00021236634
median0.00032423303
Q30.00052009925
95-th percentile0.00138488888
Maximum0.0641894
Range0.06417740121
Interquartile range (IQR)0.00030773291

Descriptive statistics

Standard deviation0.000996061107
Coefficient of variation (CV)1.893235335
Kurtosis457.1446999
Mean0.0005261158444
Median Absolute Deviation (MAD)0.00013423315
Skewness15.37889315
Sum173.2168023
Variance9.921377288e-07
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.000491000458< 0.1%
 
0.001192661646< 0.1%
 
0.0002308663846< 0.1%
 
0.000305521344< 0.1%
 
0.001638096142< 0.1%
 
0.000664835941< 0.1%
 
0.0002203975239< 0.1%
 
0.000649719836< 0.1%
 
0.0007314810632< 0.1%
 
0.000187109831< 0.1%
 
Other values (322563)32882299.9%
 
ValueCountFrequency (%) 
1.1998789e-051< 0.1%
 
1.3727406e-051< 0.1%
 
1.4215268e-051< 0.1%
 
1.4599402e-051< 0.1%
 
1.4817676e-051< 0.1%
 
ValueCountFrequency (%) 
0.06418941< 0.1%
 
0.0577951371< 0.1%
 
0.056522611< 0.1%
 
0.054629111< 0.1%
 
0.05155551< 0.1%
 

SFR_zip
Real number (ℝ≥0)

Distinct count1528
Unique (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean85.59444108651215
Minimum1.0
Maximum6883.0
Zeros0
Zeros (%)0.0%
Memory size2.5 MiB

Quantile statistics

Minimum1
5-th percentile4
Q114
median38
Q3100
95-th percentile320
Maximum6883
Range6882
Interquartile range (IQR)86

Descriptive statistics

Standard deviation137.9164631
Coefficient of variation (CV)1.611278272
Kurtosis81.24951283
Mean85.59444109
Median Absolute Deviation (MAD)29
Skewness5.728550577
Sum28180857
Variance19020.9508
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
569792.1%
 
669432.1%
 
769152.1%
 
868152.1%
 
465562.0%
 
965102.0%
 
1064272.0%
 
361461.9%
 
1159621.8%
 
1257231.7%
 
Other values (1518)26426180.3%
 
ValueCountFrequency (%) 
133411.0%
 
250851.5%
 
361461.9%
 
465562.0%
 
569792.1%
 
ValueCountFrequency (%) 
68831< 0.1%
 
58581< 0.1%
 
43071< 0.1%
 
42011< 0.1%
 
41031< 0.1%
 

RECPI_zip
Real number (ℝ≥0)

Distinct count323968
Unique (%)98.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.04431950449455612
Minimum1.1998789e-05
Maximum9.541773
Zeros0
Zeros (%)0.0%
Memory size2.5 MiB

Quantile statistics

Minimum1.1998789e-05
5-th percentile0.00099471418
Q10.0042709424
median0.013481382
Q30.038867597
95-th percentile0.164759802
Maximum9.541773
Range9.541761001
Interquartile range (IQR)0.0345966546

Descriptive statistics

Standard deviation0.1388480899
Coefficient of variation (CV)3.132889041
Kurtosis612.761959
Mean0.04431950449
Median Absolute Deviation (MAD)0.011116123
Skewness18.2065209
Sum14591.6207
Variance0.01927879208
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.000491000442< 0.1%
 
0.000305521335< 0.1%
 
0.0002308663835< 0.1%
 
0.001192661634< 0.1%
 
0.000664835932< 0.1%
 
0.001638096132< 0.1%
 
0.0002203975231< 0.1%
 
0.000649719823< 0.1%
 
0.000187109823< 0.1%
 
0.0007314810622< 0.1%
 
Other values (323958)32892899.9%
 
ValueCountFrequency (%) 
1.1998789e-051< 0.1%
 
1.3727406e-051< 0.1%
 
1.4817676e-051< 0.1%
 
1.8027562e-051< 0.1%
 
1.8461504e-052< 0.1%
 
ValueCountFrequency (%) 
9.5417731< 0.1%
 
9.1737291< 0.1%
 
8.4193931< 0.1%
 
7.4585361< 0.1%
 
7.41820961< 0.1%
 

EQI_MSA
Real number (ℝ≥0)

Distinct count21457
Unique (%)6.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.0005672011306991286
Minimum2.37131e-05
Maximum0.015639344
Zeros0
Zeros (%)0.0%
Memory size2.5 MiB

Quantile statistics

Minimum2.37131e-05
5-th percentile0.00014551292
Q10.00025281747
median0.0003858068
Q30.0005928581
95-th percentile0.00166034
Maximum0.015639344
Range0.0156156309
Interquartile range (IQR)0.00034004063

Descriptive statistics

Standard deviation0.0006931956941
Coefficient of variation (CV)1.222133837
Kurtosis54.77488203
Mean0.0005672011307
Median Absolute Deviation (MAD)0.00015247406
Skewness5.738463797
Sum186.7435987
Variance4.805202704e-07
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.000442743338630.3%
 
0.000266262858630.3%
 
0.00028703318630.3%
 
0.000379161078630.3%
 
0.00025697148630.3%
 
0.000452166168630.3%
 
0.000446333038630.3%
 
0.000327647668630.3%
 
0.000316467178630.3%
 
0.000466874868630.3%
 
Other values (21447)32060797.4%
 
ValueCountFrequency (%) 
2.37131e-052< 0.1%
 
2.7167029e-051< 0.1%
 
2.9056831e-056< 0.1%
 
3.2550477e-051< 0.1%
 
3.9400034e-052< 0.1%
 
ValueCountFrequency (%) 
0.01563934413< 0.1%
 
0.01031073652590.1%
 
0.008045252< 0.1%
 
0.00751865358< 0.1%
 
0.00700681662< 0.1%
 

SFR_MSA
Real number (ℝ≥0)

Distinct count3482
Unique (%)1.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9709.09237114905
Minimum1.0
Maximum153589.0
Zeros0
Zeros (%)0.0%
Memory size2.5 MiB

Quantile statistics

Minimum1
5-th percentile64
Q1376
median1819
Q39893
95-th percentile51571
Maximum153589
Range153588
Interquartile range (IQR)9517

Descriptive statistics

Standard deviation17734.4361
Coefficient of variation (CV)1.826580222
Kurtosis14.08393759
Mean9709.092371
Median Absolute Deviation (MAD)1704
Skewness3.219428025
Sum3196592445
Variance314510223.8
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
129860.9%
 
5031817260.5%
 
305368630.3%
 
311128630.3%
 
417738630.3%
 
538378630.3%
 
619328630.3%
 
212888630.3%
 
583988630.3%
 
243928630.3%
 
Other values (3472)31762196.5%
 
ValueCountFrequency (%) 
129860.9%
 
21720.1%
 
395< 0.1%
 
4122< 0.1%
 
5138< 0.1%
 
ValueCountFrequency (%) 
1535891830.1%
 
1487591830.1%
 
1405211830.1%
 
1320361830.1%
 
1261381830.1%
 

RECPI_MSA
Real number (ℝ≥0)

Distinct count21464
Unique (%)6.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.263927133687047
Minimum2.37131e-05
Maximum132.1134
Zeros0
Zeros (%)0.0%
Memory size2.5 MiB

Quantile statistics

Minimum2.37131e-05
5-th percentile0.017739179
Q10.12207872
median0.70655054
Q35.6311555
95-th percentile24.299002
Maximum132.1134
Range132.1133763
Interquartile range (IQR)5.50907678

Descriptive statistics

Standard deviation10.39191866
Coefficient of variation (CV)1.974176009
Kurtosis30.34050581
Mean5.263927134
Median Absolute Deviation (MAD)0.6749912
Skewness4.281911243
Sum1733079.578
Variance107.9919734
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
12.00322158630.3%
 
12.9302868630.3%
 
13.896568630.3%
 
11.0208698630.3%
 
13.1221368630.3%
 
13.043858630.3%
 
16.7275058630.3%
 
9.8189238630.3%
 
12.6508548630.3%
 
10.1661278630.3%
 
Other values (21454)32060797.4%
 
ValueCountFrequency (%) 
2.37131e-052< 0.1%
 
2.9056831e-056< 0.1%
 
3.2550477e-051< 0.1%
 
4.0748477e-056< 0.1%
 
4.1634128e-0522< 0.1%
 
ValueCountFrequency (%) 
132.1134145< 0.1%
 
117.37917145< 0.1%
 
114.993835145< 0.1%
 
98.44064145< 0.1%
 
84.49912145< 0.1%
 

EQI_state
Real number (ℝ≥0)

Distinct count1363
Unique (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.0006146994549785534
Minimum7.426358e-05
Maximum0.0037453347
Zeros0
Zeros (%)0.0%
Memory size2.5 MiB

Quantile statistics

Minimum7.426358e-05
5-th percentile0.00017058598
Q10.00030582474
median0.0004511815
Q30.0006532353
95-th percentile0.0019132774
Maximum0.0037453347
Range0.00367107112
Interquartile range (IQR)0.00034741056

Descriptive statistics

Standard deviation0.0005241844583
Coefficient of variation (CV)0.8527491834
Kurtosis4.9561615
Mean0.000614699455
Median Absolute Deviation (MAD)0.00016089047
Skewness2.192916245
Sum202.3818045
Variance2.747693463e-07
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.000318414429180.3%
 
0.000429312329180.3%
 
0.00043416499180.3%
 
0.000305892359180.3%
 
0.0003186389180.3%
 
0.000455831859180.3%
 
0.00034029359180.3%
 
0.000469830739180.3%
 
0.000413705579180.3%
 
0.00036241139180.3%
 
Other values (1353)32005797.2%
 
ValueCountFrequency (%) 
7.426358e-05103< 0.1%
 
7.8000994e-05103< 0.1%
 
7.82489e-05103< 0.1%
 
8.127372e-05103< 0.1%
 
8.188262e-0511< 0.1%
 
ValueCountFrequency (%) 
0.00374533474080.1%
 
0.00302915974080.1%
 
0.0029479224080.1%
 
0.00278419197070.2%
 
0.0026056187070.2%
 

SFR_state
Real number (ℝ≥0)

Distinct count1346
Unique (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean53897.785176028214
Minimum40.0
Maximum330536.0
Zeros0
Zeros (%)0.0%
Memory size2.5 MiB

Quantile statistics

Minimum40
5-th percentile4881
Q116601
median35042
Q368229
95-th percentile170046
Maximum330536
Range330496
Interquartile range (IQR)51628

Descriptive statistics

Standard deviation57447.41025
Coefficient of variation (CV)1.065858459
Kurtosis5.227866987
Mean53897.78518
Median Absolute Deviation (MAD)21999
Skewness2.153744281
Sum1.77451451e+10
Variance3300204945
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
11362316480.5%
 
1867510850.3%
 
1142969180.3%
 
554769180.3%
 
1344039180.3%
 
1116229180.3%
 
609509180.3%
 
1116159180.3%
 
761409180.3%
 
1287269180.3%
 
Other values (1336)31916096.9%
 
ValueCountFrequency (%) 
401< 0.1%
 
631< 0.1%
 
651< 0.1%
 
711< 0.1%
 
941< 0.1%
 
ValueCountFrequency (%) 
3305368000.2%
 
3139608000.2%
 
2957168000.2%
 
2875918000.2%
 
2768898000.2%
 

RECPI_state
Real number (ℝ≥0)

Distinct count1363
Unique (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean34.57783452332148
Minimum0.01783298
Maximum442.21994000000007
Zeros0
Zeros (%)0.0%
Memory size2.5 MiB

Quantile statistics

Minimum0.01783298
5-th percentile1.5803634
Q16.971263
median16.283487
Q335.750328
95-th percentile93.418846
Maximum442.21994
Range442.202107
Interquartile range (IQR)28.779065

Descriptive statistics

Standard deviation58.46623664
Coefficient of variation (CV)1.690858825
Kurtosis18.81983124
Mean34.57783452
Median Absolute Deviation (MAD)11.3845894
Skewness4.057308626
Sum11384302.5
Variance3418.300827
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
34.2299659180.3%
 
56.1841139180.3%
 
24.0706739180.3%
 
30.7075239180.3%
 
39.8536159180.3%
 
27.865469180.3%
 
32.961789180.3%
 
34.4029549180.3%
 
32.5406579180.3%
 
37.4040459180.3%
 
Other values (1353)32005797.2%
 
ValueCountFrequency (%) 
0.0178329811< 0.1%
 
0.0276760614< 0.1%
 
0.03167600611< 0.1%
 
0.0324715911< 0.1%
 
0.03350402811< 0.1%
 
ValueCountFrequency (%) 
442.219947070.2%
 
424.644327070.2%
 
397.99587070.2%
 
345.096077070.2%
 
308.30757070.2%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

yearzipcodeEQI_zipSFR_zipRECPI_zipEQI_MSASFR_MSARECPI_MSAEQI_stateSFR_stateRECPI_state
0198810010.00081548.00.0391080.0010211235.01.2608880.00147617558.025.921940
1198910010.00111644.00.0491000.0011681049.01.2253840.00175115343.026.866861
2199010010.00162945.00.0733170.001243841.01.0451610.00185713556.025.172453
3199110010.00082627.00.0222980.001375714.00.9817240.00182312798.023.330479
4199210010.00221622.00.0487440.001549760.01.1768770.00211113289.028.052156
5199310010.00082727.00.0223340.001266824.01.0432050.00205714110.029.028145
6199410010.00477233.00.1574890.001394804.01.1211050.00199614843.029.624968
7199510010.00106523.00.0244960.001667817.01.3623250.00202315180.030.703022
8199610010.00197129.00.0571660.001361884.01.2032230.00232616520.038.430626
9199710010.00084042.00.0352830.001233902.01.1120230.00238617254.041.174500

Last rows

yearzipcodeEQI_zipSFR_zipRECPI_zipEQI_MSASFR_MSARECPI_MSAEQI_stateSFR_stateRECPI_state
3292272007999010.00009327.00.0025160.00009432.00.0030110.0001261356.00.170988
3292282008999010.00012027.00.0032370.00011728.00.0032830.0001201214.00.145699
3292292009999010.00009621.00.0020150.00009621.00.0020150.0001011190.00.120600
3292302010999010.00009330.00.0027820.00008636.00.0031030.0001121336.00.149195
3292312011999010.00008035.00.0027950.00007836.00.0028240.0001111662.00.185014
3292322012999010.00010730.00.0031950.00010036.00.0036020.0001231708.00.209606
3292332013999010.00005626.00.0014690.00006030.00.0018100.0000882114.00.185809
3292342014999010.00007032.00.0022420.00006934.00.0023460.0000882260.00.198317
3292352015999010.00007750.00.0038670.00007652.00.0039570.0000933179.00.295212
3292362016999010.00007647.00.0035710.00007354.00.0039290.0000823847.00.315002